Form similarity via Levenshtein distance between ortho-filtered logarithmic ruling-gap ratios

نویسندگان

George Nagy

Daniel P. Lopresti

چکیده

Geometric invariants are combined with edit distance to compare the ruling configuration of noisy filled-out forms. It is shown that gap-ratios used as features capture most of the ruling information of even low-resolution and poorly scanned form images, and that the edit distance is tolerant of missed and spurious rulings. No preprocessing is required and the potentially time-consuming string operations are performed on a sparse representation of the detected rulings. Based on edit distance, 158 Arabic forms are classified into 15 groups with 89% accuracy. Since the method was developed for an application that precludes public dissemination of the data, it is illustrated on public-domain death certificates.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-language Phonetic Similarity Measure on Terms Appeared in Asian Languages

This study aims to develop a phonetic similarity measurement method across Asian languages. The method, cross-language similarity algorithm aggregates the transcription of language-specific Romanization, the International Phonetic Alphabet, the Soundex algorithm, and Levenshtein distance. To evaluate the proposed algorithm, this study involves an experiment using ninety-two chemical element nam...

متن کامل

Lexical similarity can distinguish between automatic and manual translations

We consider the problem of identifying automatic translations from manual translations of the same sentence. Using two different similarity metrics (BLEU and Levenshtein edit distance), we found out that automatic translations are closer to each other than they are to manual translations. We also use phylogenetic trees to provide a visual representation of the distances between pairs of individ...

متن کامل

Mutual intelligibility of Chinese dialects: Predicting cross-dialect word intelligibility from lexical and phonological similarity

This paper aims to predict mutual intelligibility (defined here as cross-dialectal word recognition) between 15 Chinese dialects from lexical and phonological distance measures. Distances were measured on the stimulus materials used in the experiment. Their predictive power was compared with earlier similar distance measures based on large word lists. Predictors based on just the stimulus mater...

متن کامل

Adaptive String Distance Measures for Bilingual Dialect Lexicon Induction

This paper compares different measures of graphemic similarity applied to the task of bilingual lexicon induction between a Swiss German dialect and Standard German. The measures have been adapted to this particular language pair by training stochastic transducers with the ExpectationMaximisation algorithm or by using handmade transduction rules. These adaptive metrics show up to 11% F-measure ...

متن کامل

A Knowledge-Rich Approach to Measuring the Similarity between Bulgarian and Russian Words

We propose a novel knowledge-rich approach to measuring the similarity between a pair of words. The algorithm is tailored to Bulgarian and Russian and takes into account the orthographic and the phonetic correspondences between the two Slavic languages: it combines lemmatization, hand-crafted transformation rules, and weighted Levenshtein distance. The experimental results show an 11-pt interpo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Form similarity via Levenshtein distance between ortho-filtered logarithmic ruling-gap ratios

نویسندگان

چکیده

منابع مشابه

Cross-language Phonetic Similarity Measure on Terms Appeared in Asian Languages

Lexical similarity can distinguish between automatic and manual translations

Mutual intelligibility of Chinese dialects: Predicting cross-dialect word intelligibility from lexical and phonological similarity

Adaptive String Distance Measures for Bilingual Dialect Lexicon Induction

A Knowledge-Rich Approach to Measuring the Similarity between Bulgarian and Russian Words

عنوان ژورنال:

اشتراک گذاری